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We have studied ti production using multijet final states in pp collisions at a center-of-mass 
energy of 1.8 TeV, with an integrated luminosity of 110.3 pb“^. Each of the top quarks with these 
final states decays exclusively to a bottom quark and a W boson, with the W bosons decaying 
into quark-antiquark pairs. The analysis has been optimized using neural networks to achieve the 
smallest expected fractional uncertainty on the ti production cross section, and yields a cross section 
of 7.1 ± 2.8 (stat) ± 1.5 (syst) pb, assuming a top quark mass of 172.1 GeV/c^. Combining this 
result with previous D0 measurements, where one or both of the W bosons decay leptonically, gives 
a ti production cross section of 5.9 ± 1.2 (stat) ± 1.1 (syst) pb. 
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VIII Summary 


In the standard model, the top quark decays to a 6 
quark and a W boson, and the dominant decay of the W 
boson is into a quark-antiquark pair. Events with a tt 
pair can have both W bosons decaying to quarks. This 
is referred to as the “all-jets” channel, and is expected to 
account for 44% of the tt production cross section. 

The observation of top quark production in the 
channels involving one or two leptons motivates us to in¬ 
vestigate ti decays into other channels. D0 has measured 
a top quark mass, rut, of 172.1 ± 5.2 (stat) ± 4.9 (syst) 
GeV/c^ Q and a tt production cross section of 5.6 dz 
1.4 (stat) ± 1.2 (syst) pb Q|, while CDF has measured 
a mass of 175.9 ± 4.8 (stat) ± 4.9 (syst) GeV/c^ § and 
a ti production cross section of 7.6 J pb Recently, 
CDF has reported on the all-jets channel H, and finds 
the ti production cross section to be 10.1 pb and a 
top quark mass of 186 ± 10 (stat) ± 12 (syst) GeV/c^. 

The work presented here is based on 110.3 ± 5.8 pb“^ 
of data recorded between August 1992 and February 1996 
at the Fermilab Tevatron collider, with a pp center-of- 
mass energy of 1.8 TeV. Assuming the branching ratio 
and cross section predicted by the standard model, we 
expect approximately 200 ti —>■ all-jets events in this data 
sample. 

The signature for ti production in the all-jets chan¬ 
nel is six or more high transverse momentum jets with 
kinematic properties consistent with the top quark de¬ 
cay hypothesis. At least two of these jets originate from 
b quarks. The background to this signature consists of 
events from other processes that can also produce six or 
more jets. The ti channel is one of the few examples of 
multijet final states that are dominated by quarks rather 
than gluons. This fact has motivated us to include the 
characteristic differences between quark and gluon jets 
in separating the top quark to all-jets signal from back¬ 
ground. 

Interest in the all-jets decay channel of top quarks also 
stems from the fact that, without any unobserved parti¬ 
cles in the final-state, the all-jets mode is the most kine¬ 
matically constrained of all the top quark decay channels. 
Furthermore, since the top quark is quite massive, decays 
via charged Higgs may be possible. If channels such as 
t H'^b have a significant branching fraction, the main 
effect could be a deficit in the ti final states with energetic 
electrons or muons, relative to the all-jets channel. 

II. OUTLINE OF METHOD 

The search for the top quark in the all-jets channel be¬ 
gan with the imposition of preliminary selection criteria 
at the trigger stage, followed by more stringent criteria in 
the offline analysis. As these initial criteria were not very 
restrictive, the observed cross section, primarily from 
QCD processes, was more than three thousand times 
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larger than the expected signal. The principal challenge 
in the search was to develop a set of selection criteria that 
could significantly improve the signal-to-background ra¬ 
tio, and provide an estimate of the background remaining 
after imposing any selection requirements. 

The data sample consisted of over 600,000 events after 
the initial selection criteria. Because of the small number 
of ti events expected in the presence of this large back¬ 
ground, and with only modest discrimination in any sin¬ 
gle kinematic or topological property, traditional meth¬ 
ods of analysis were inadequate. The analysis would have 
to involve many variables, which are likely to be highly 
correlated. Neural networks were chosen as the appro¬ 
priate tool for handling many variables simultaneously. 

The analysis relied on Monte Carlo simulations to 
model the properties of tt events. These simulations were 
performed for different top quark masses, and the final 
results interpolated to the mass measured by the D0 col¬ 
laboration. We note that the ti detection efficiency is 
not strongly dependent on the assumed mass of the top 
quark. 

In contrast, the background model was determined en¬ 
tirely from data. An advantage of the overwhelming 
background-to-signal ratio is that the data provide an 
almost pure background sample. This approach obviates 
a number of concerns when calculating the background. 
The background is predominantly QCD multijet produc¬ 
tion, which involves higher-order processes that may not 
be well modeled in a Monte Carlo simulation. Further¬ 
more, detector effects are implicitly included when data 
are employed for the model of the background. 

Soft-lepton tagging, using muons embedded in jets, 
serves as a possible signature for the presence of a 6 quark 
within the jet, and is referred to as ^-tagging. By identi¬ 
fying the muon from the semileptonic decay of a 6 quark 
(or the sequential decay), 6-tagging of jets improves the 
signal-to-background ratio significantly. The ti events are 
tagged roughly 20% of the time, whereas the tag rate for 
QCD multijet events with similar requirements is about 
3%. Requiring the presence of a muon tag in the event 
therefore provides nearly a factor of ten in background 
rejection and a method to estimate this background. 

The background calculation relied on being able to pre¬ 
dict the number of events that are 6-tagged, based on 
events without such tags. To make the untagged data 
represent the background in this analysis, a way of es¬ 
timating the tagging rate in QCD events was needed. 
This was done by constructing a “tag rate” function, 
determined from data, that is applied to each jet sep¬ 
arately. This function is simply the probability for any 
individual jet to have a muon tag. Application of the 
tag rate function to each jet in untagged events gives the 
background model for our final event sample. The pres¬ 
ence of ti signal was identified by an excess observed in 
the data above this background. This excess should be 
small in the regions of the neural network output where 
background dominates, but should be enhanced where 
significant signal is expected. 



FIG. 1. Isometric view of the D0 detector. 

This analysis employed two neural networks to extract 
the final ti cross section. The first had as its input vari¬ 
ables those parameters involving kinematic and topolog¬ 
ical properties of the events that were highly correlated. 
The output of this neural network was used as an in¬ 
put variable to a second neural network, along with three 
other inputs. These three inputs were the transverse mo¬ 
mentum {pt) of the tagging muon, a discriminant based 
on the widths of the jets, and a likelihood variable that 
parameterized the degree to which an event was consis¬ 
tent with the ti decay hypothesis. These three variables 
were less correlated than the kinematic variables used in 
the first neural network. The ti cross section was de¬ 
termined from the output of this second neural network 
by fitting the neural network output distributions of the 
signal and background outputs to the observed data. 

III. THE D0 DETECTOR 

D0 is a multipurpose detector designed to study pp 
collisions at the Fermilab Tevatron Collider. The detec¬ 
tor was commissioned during the summer of 1992. A 
full description of the detector can be found in references 
. Here we describe the properties of the detector that 
are most relevant to the search in the all-jets channel. An 
isometric view of the detector is shown in Fig. |^. 

A. Tracking system 

The tracking system consists of a vertex drift chamber, 
a transition radiation detector, a central drift chamber, 
and two forward drift chambers. The system provides 
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charged-particle tracking over the pseudorapidity region 
\r]\ < 3.2, where r] = — ln(tan (0/2)); 0 and ij) are, re¬ 
spectively, the polar and azimuthal angles relative to the 
proton beam axis. The resolution for charged particles is 
2.5 mrad in </> and 28 mrad in 9. The position of the inter¬ 
action vertex along the beam direction (z) is determined 
typically to an accuracy of 8 mm. 

B. Calorimeter 

The liquid-argon calorimeter, using uranium and stain¬ 
less steel/copper absorber, is divided into three parts: 
a central calorimeter and two end calorimeters. Each 
part consists of an inner electromagnetic section, a fine 
hadronic section, and a coarse hadronic section, housed 
in a stainless steel cryostat. The intercryostat detector 
consists of scintillator tiles inserted in the space between 
the central and end calorimeter cryostats. In addition, 
“massless gaps”, installed inside both central and end 
calorimeters, are active readout cells, without absorber 
material, located inside the cryostat adjacent to the cryo¬ 
stat walls. The intercryostat detector and massless gaps 
improve the energy resolution for jets that straddle two 
cryostats. The calorimeter covers the pseudorapidity 
range \r]\ < 4.2, and has a typical segmentation of 0.1 
X 0.1 in Arj x Acf). The energy resolution is S{E)/E = 
15%/y'E(GeV) 0 0.4% for electrons. For charged pions, 
the resolution is approximately 50%/^E{GeV), and for 
jets approximately 80%o/E{GeV) 

As can be seen in Fig. ||, the Main Ring beam pipe 
penetrates the outer hadronic section of the calorimeters 
and the muon spectrometer. The Main Ring carries pro¬ 
tons with energies between 8-150 GeV, and is used in 
antiproton production during the Tevatron pp running. 
Because of this, any losses from the Main Ring can pro¬ 
duce backgrounds in the detector that must be removed. 

C. Muon spectrometer 

The D0 experiment detects muons using proportional 
drift tubes (PDT) and an iron toroid. Because muons 
from top quark decays populate predominantly the cen¬ 
tral region, this analysis uses muon detection systems in 
the region |ry| < 1. 

The combined material in the calorimeter and iron 
toroid has between 13 and 19 interaction lengths (the 
range-out energy for muons is approximately 3.5 GeV), 
making background from hadronic punchthrough negli¬ 
gible. Also, the small central tracking volume minimizes 
background from in-flight decays of pions and kaons. 

A typical muon track is measured in four layers of 
PDTs before, and six layers after, the iron toroid. The 
six layers are constructed in two super-layers that are 
separated by about one meter to provide a good lever 
arm for measuring the muon momentum, p. The muon 


TABLE I. Main running periods of the 1992-1996 run. 


Run 

Period 

Dates 

Run 

Numbers 

Integrated 

Luminosity 

la 

1992-1993 

50000-70000 

13.0 pb-^ 

Ib 

1993-1995 

70000-94000 

86.4 pb“i 

Ic 

1995-1996 

94000-96000 

10.8 pb-i 


momentum is determined from its deflection angle in the 
magnetic field of the toroid. The momentum resolution is 
limited by multiple scattering in the traversed material, 
knowledge of the integrated magnetic field, and resolu¬ 
tion on the measurement of the deflection angle. The 
resolution is roughly Gaussian in 1/p, and is approxi¬ 
mately S{l/p) = O.18(p-2)/p^0O.OO3 (with p in GeV/c) 
for the algorithms that were used in this analysis. 

IV. DATA SAMPLE 

This section describes the data sample and the simu¬ 
lated events for the ti signal used in our analysis. 

A. Initial selection criteria 

The data sample was selected by imposing both hard¬ 
ware (Level 1) and software (Level 2) trigger require¬ 
ments. These requirements were modified slightly over 
the course of the 1992-1996 run in order to accommo¬ 
date the higher instantaneous luminosities later in the 
run. Table I indicates the three main running periods, 
the run numbers associated with these periods, and the 
integrated luminosity collected. 

The hardware trigger required the presence of at least 
four calorimeter trigger towers (0.2 x 0.2 in Arj x Ac/), 
each with transverse energy Et > 5 GeV, for the la pe¬ 
riod. In the Ib and Ic periods, the Et requirement was 
raised to 7 GeV, and an additional requirement for at 
least three large tiles (0.8 x 1.6 in Ary x Ac/) with Et > 
15 GeV was imposed. These were imposed to reduce 
the trigger rate and avoid saturating the bandwidth of 
the trigger system at high instantaneous luminosities (> 
10^^ cm“^ s“^). 

The software filter required five jets, defined by 
+ (Ac/)2=0.3 cones, with |ry| < 2.5 and Et > 
10 GeV. Again, in order to reduce the data rate at high 
luminosities during the Ib period, a further condition was 
added requiring the scalar sum of the Et of all jets (de¬ 
fined as Ht) to be greater than 110 or 115 GeV, de¬ 
pending upon run number. This EIt requirement was 
raised to 120 GeV during the Ic period. The effects of 
these changes on the acceptance for tt events were stud¬ 
ied using Monte Carlo simulations, and were found to be 
negligible. 
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TABLE II. Initial criteria used for data selection. 


General 

Conditions 

Sequential 

Requirements 

Effective 

cross 

section 

Cumulative 
Efficiency 
(mt = 180 
GeV/c^) 

Level 1 
trigger 

Four trigger towers 
Et> 5,7 GeV (la, Ib-c) 
Three large tiles 

Et > 15 GeV (Ib-c) 

0.4 ± 0.1 fih 

0.98 

Level 2 

filter 

Five 77=0.3 jets 
\rj\ <2.5, Et > 10 GeV 
Ht > 110,115 GeV (Ib) 
Ht > 120 GeV (Ic) 

20 ± 5 nb 

0.92 

Offline 

TLt > 115 GeV from 
77=0.5 jet cones 
\r]\ <2.5, Et > 8 GeV 
Cuts for spurious jets 

5.4 ± 1.3 nb 

0.87 



0 2.5 5 

Jet ip (rad) 



150 


100 


50 


0 2.5 5 

Jet p (rod) 



In addition to imposing trigger and filter requirements, 
a set of offline selection criteria was used to reduce the 
data sample to a manageable size without greatly affect¬ 
ing the acceptance for the tt signal. First, Ht was re¬ 
quired to be greater than 115 GeV, where the sum used 
7^=0.5 jets with I 77 I < 2.5 and Et > 8 GeV. Also, require¬ 
ments were imposed in order to eliminate events with 
spurious jets due to spray from the Main Ring or effects 
from noiw cells in the calorimeter ||^,|^. For exam¬ 
ple, Fig. y shows the imbalance in transverse energy, or 
missing Et event versus the azimuthal an¬ 

gle ((/)) of the jet, before and after the rejection of Main 
Ring events. We see that our requirements have removed 
the spurious cluster of jets in the region where the Main 
Ring pierces the D0 detector (1.6 < (j) < 1.8). Table II 
summarizes the impact of the trigger and initial recon¬ 
struction criteria on the tt signal for a top quark mass of 
180 GeV/c 2 . 

B. Jet algorithms 

The jet algorithm is the fundamental analysis tool in 
the search for tt events in the all-jets mode. One of 
the most important considerations in choosing a jet algo¬ 
rithm is the efficiency for reconstructing the six primary 
tt decay products. The rj distribution of the jets from tt 
decays tends to be quite narrow, and therefore the TZ sep¬ 
aration between adjacent jets is frequently small. When 
two jets are too close together, they may not be resolved, 
leading to reconstruction inefficiency. 

Figure ^ shows the reconstruction efhciency for the 
cone jet algorithm [ p^ with various cone sizes for sim¬ 
ulated ti events in the all-jets channel, as generated with 
the HERWIG Monte Carlo program |^. Here, the deh- 
nition of a quark includes any final state gluon radiation 
added back to the quark momentum. The matching of 


FIG. 2. The effect of imposing requirements to reject Main 
Ring events. A scatter plot of missing Et versus (j) for jets be¬ 
fore (a), and after (b), imposing our Main Ring requirements. 

reconstructed jets to quarks relies on using combinations 
of the two that minimize the distance in TZ between them. 
A jet is considered to be matched only if that distance is 
less than A7?.=0.5, the energy of the jet is within a factor 
of two of the quark energy, and the reconstructed jet Et 
is greater than 10 GeV. 

Figures ||(a) and||(b) show how the reconstruction ef¬ 
ficiency depends on quark Et and rj for the cone algo¬ 
rithm with different cone sizes. The 7?.=0.3 cone algo¬ 
rithm shows a higher jet reconstruction efficiency than 
the larger cone algorithms. In the central region, the 
TZ—0.3 cone algorithm has an efhciency of 94%, while 
the 7?.=0.5 and TZ=0.7 cone algorithms are 90% and 81% 
efficient, respectively. Given an average efficiency e for 
reconstructing a single jet, the reconstruction efficiency 
for hnding ti events (with six or more jets) will be of the 
order of e®. Therefore, larger cone sizes are less efficient 
in the multijet environment. 

Figure ||(c) shows the correspondence between parton 
and jet energies found for various cone algorithms, af¬ 
ter DO jet energy corrections are applied (see next sec¬ 
tion). Linear hts to the quark-jet correlation in energy 
are shown in Fig. |3|(c) for the three cone algorithms. Fig¬ 
ure ^(d) shows the three-jet invariant mass for the correct 
combinations of jets matching top and antitop quarks. 
The areas of the mass distributions reflect the event re¬ 
construction efficiencies for different algorithms. 

The shift in the reconstructed mass from the input 
mass of the top quark (175 GeV/c^) shows that the jet 
algorithms are not equivalent. The shift in three-jet mass 
from the nominal input top quark mass increases as the 
cone radius is decreased. The widths of the mass distribu¬ 
tions are not very sensitive to the choice of cone size. The 
overall root-mean-square, RMS, spread in reconstructed 
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FIG. 3. Jet reconstruction for ti Monte Carlo events (her- 
WIG, rrit — 175 GeV/c^) for various cone sizes: TZ=0.3 (open 
squares), TZ—0.5 (full circles), and 7?.=0.7 (open circles), (a) 
Jet finding efficiency versus quark Et- (b) Jet finding effi¬ 
ciency versus quark 77 . (c) Reconstructed jet energy versus 
that of the input quark, (d) Reconstructed mass of the top 
quark from correct jet combinations, where the areas reflect 
the relative efficiencies. 


mass for correct combinations of jets is approximately 
10 % of the mass. 

In summary, there are two competing effects when 
choosing the optimal jet cone size. Smaller cone sizes 
are better able to resolve separate jets, but do not do as 
well at reconstructing jet energy. However, the ability 
to resolve individual jets was deemed of higher impor¬ 
tance in the search for a signal. Hence the TZ=0.3 cone 
algorithm is preferred for analyzing multijet events. But, 
due to the relatively large shift in the jet energy for the 
7?.=0.3 cone algorithm, we chose to use the TZ—0.5 cone 
algorithm for calculating some quantities that emphasize 
energy response at the expense of jet efficiency. Jets with 
Et < 8 GeV, before application of energy corrections 
(see Sec. IV.C), were discarded. 


C. Jet energy correction 

D0 has developed a correction procedure |T^ to cal¬ 
ibrate jet energies, which is applied to both data and 
Monte Carlo. The underlying assumption is that the 
true jet energy, i^ptci, is the sum of the energies of all fi¬ 
nal state particles entering the cone algorithm applied at 
the calorimeter level. Eptd is obtained from the energy 
measured in the calorimeter, i?meas, as follows: 

p _ Eraeas ~ Ep (TZ, T], C) (AT) 

“ R{T],E,KMS)S{TZ,r],E) ’ ^ ^ 

where: 


• Eo{TZ, rj, C) is an offset, which includes the physics 
of the underlying event, noise from the radioactive 
decay of the uranium absorber, the effect of pre¬ 
vious crossings (pile-up), and the contribution of 
additional contemporaneous pp interactions. The 
physics of the underlying event is defined as the 
energy contributed by spectators to the hard par- 
ton interaction which resulted in the high-pr event. 
This offset increases as a function of the cone size 
TZ. It also depends on 77 and on the instantaneous 
luminosity, C, which is related to the contribution 
from the additional pp interactions. 

• i?(r 7 , E, RMS) is the energy response of the calorime¬ 
ter. It is nearly independent of the jet cone size, 
7?,, but does depend on the RMS width of the jet. 
The width dependence accounts for differences in 
the calorimeter response to narrow jets, which frag¬ 
mented into fewer particles (of, on average, higher 
energy) than broader jets, with larger particle mul¬ 
tiplicities. Because the various detector compo¬ 
nents are not identical, R also depends on detector 
rj. R is typically less than one, due to energy loss 
in the uninstrumented regions between modules, 
differences between the electromagnetic (e) and 
hadronic response (h) of the detector {e/h > I), 
and module-to-module inhomogeneities. 

• S{TZ,r],E) is the fraction of the jet energy that is 
deposited inside the algorithm cone. Since the jet 
energy is corrected back to the particle level, the 
effects of calorimeter showering must be removed. 
S is less than one, meaning that the effect of show¬ 
ering is a net flux of energy from inside to outside 
the cone. S depends strongly on the cone size TZ, 
energy, and 77 . 


D. Characteristics of jets 

Comparisons of jet properties (jet multiplicity, inclu¬ 
sive jet Et, 77 , and 4>, for TZ=0.3 cones) are shown in 
Fig. ^ for data from the la and Ib periods (see Table I) 
and for ti Monte Carlo. Only jets with Et > 10 GeV and 
I 77 I < 2 are included in the comparison. The results from 
la and Ib are in good agreement, although Ib typically 
had higher instantaneous luminosity. 

Figure ^(a) shows that for events with six jets, the 
background (i.e., data) is at least three orders of magni¬ 
tude larger than the expected ti signal. The peak at five 
jets is the result of the initial event selection (see Table 
H). The inclusive jet Et spectrum in Fig. ^(b) falls ex¬ 
ponentially at about the same rate for signal as for data, 
and the signal is consistently three orders of magnitude 
below the data. In Fig. ^(c), the distributions of jet 77 are 
normalized to the same area for signal and data. The sig¬ 
nal is concentrated in the central region, while the data 
extend to higher 77 . There is a difference of the order of 
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FIG. 4. Properties of jets with 7?.=0.3 cones. Data from 
the la (histograms) and Ib (circles) periods, and tt herwig for 
mt=175 GeV/c^ (shaded histograms). Only jets with Et > 
10 GeV and |t;| < 2 are inclnded. Distributions in (a) jet mul¬ 
tiplicity and (b) jet Et are each normalized to the expected 
number of events in 110.3 pb”*^ of data, while distributions 
in (c) jet \rj\ and (d) jet 4> are normalized to the same area. 


FIG. 5. Gomparisons of isajet (circles) and herwig (his¬ 
tograms) for an input top quark mass of 175 GeV/c^, and 
jets with 7i.=Q.2> cones, for (a) jet multiplicity, (b) jet rj, (c) 
Et of leading jet, and (d) fifth highest jet Et- Bars on the 
points indicate statistical uncertainties (similar uncertainties, 
although not shown, apply for the histograms). The results 
from ISAJET and herwig in (a)-(d) are normalized to the 
same area. 


10% between la and Ib in the intercryostat region (Iryj « 
1.2) due to improvements in the Ib period. Figure ^(d) 
shows that the (j) distribution of jets is isotropic, except 
for a 5% suppression in the region of the Main Ring. The 
Monte Carlo does not simulate the effects of the Main 
Ring, and consequently has no apparent structure in (j). 

E. Simulation of ti events 

The simulation of tt events plays an important role 
in extracting a signal in the presence of significant back¬ 
ground. It is necessary, therefore, to have a good descrip¬ 
tion of the production and decay of ti events in order to 
calculate detector acceptances accurately and to develop 
methods to identify ti events in the data. 

The ti events were generated for top quark masses be¬ 
tween 120 and 220 GeV/c^ for the reaction pp ^ ti + X 
using HERWIG as a primary model and isajet [|^ as a 
check. The underlying assumptions in the fragmentation 
of partons are different in the two programs. The gener¬ 
ated events were put through the D0 shower library 0, 
a fast detector simulation package based on GEANT [1^|, 
which contains the effects of cracks and other dead mate¬ 
rial in the D0 calorimeter, and provides accurate shower 
simulation. The GEANT simulation has been tuned to 
achieve a good match between generated single-particle 
characteristics and observed data H- Events were sub¬ 
sequently digitized, passed through the D0 reconstruc¬ 
tion program |^, and subjected to the same selection 


criteria as the data (see Table II). Events passing these 
criteria served as the model for our studies of ti proper¬ 
ties. 

Generally, acceptances for ti production as calculated 
with HERWIG or ISAJET agree to within 10%, and any dif¬ 
ferences between the two are incorporated in the final sys¬ 
tematic uncertainties. As an illustration of the discrepan¬ 
cies, we show in Fig. distributions of jet multiplicity, jet 
ry, the Et of the leading jet, and the fifth highest jet Et 
for HERWIG and isajet. Except for jet multiplicity, these 
distributions are in good agreement. It has been shown 
1^ that ISAJET produces more gluon radiation than her¬ 
wig, in accord with our results in Fig. ||(a). 

V. KINEMATIC PARAMETERS 

The principal background to the ti signal is QCD mul¬ 
tijet production, which is dominated by a 2 ^ 2 parton 
process with additional jets produced through gluon ra¬ 
diation. Therefore, the background tends to have jets 
that are more forward-backward in rapidity. The addi¬ 
tional jets are generally lower in Et (i.e., softer) than the 
initial outgoing parent partons. Furthermore, this extra 
radiation tends to lie in a plane formed by the incoming 
beam and the two leading jets. 

Because the mass of the top quark is large, the charac¬ 
teristic energy scale (commonly called of the ti event 
is significantly larger than that of the average QCD back¬ 
ground event. This means that ti events generally have 




















jets with higher Et^ and have larger multijet invariant 
masses. 

Extracting a signal from data dominated by back¬ 
ground requires the use of global kinematic parameters 
based on these differences. Employing such parameters 
helps to differentiate between the ti. signal and back¬ 
ground. We can summarize the salient features of the 
background, relative to the ti signal, as follows: 

• The overall energy scale is lower; leading jets have 
lower Et', multijet invariant masses are smaller. 

• The additional radiated jets are softer (have lower 
Et)- 

• The event shape is more planar (less spherical). 

• The jets are more forward-backward in rapidity 
(less central). 

We defined two or more kinematic parameters that 
quantihed aspects of each property. Only the most ef¬ 
fective of these were used and these are discussed be¬ 
low. We found that, in general, better discrimination 
was achieved using TZ=0.3 cone jets (with \r]\ < 2.0 and 
Et > 15 GeV) than TZ—0.5 cone jets. However, in some 
instances, TZ=0.5 cone jets were used, and this is noted 
where it occurs. 

Although correlations exist between many of the kine¬ 
matic parameters, each includes useful information not 
fully contained in any of the others. These correlations 
are presented in Sec. VI.D. 

A. Parameters sensitive to energy scale 

Any parameter that depends on the energy scale of 
the jets is also sensitive to the mass of the top quark. 
These “mass sensitive” parameters usually provide bet¬ 
ter discrimination against QCD background than other 
parameters that provide only a measure of some topolog¬ 
ical feature. Three mass sensitive parameters are: 

1 . Ht 

The sum of the transverse energies of jets in a given 
event characterizes the transverse energy flow, and 
is defined as: 

-^jets 

Ht=^ Et^ (5.1) 

i=i 

where Et^ is the transverse energy of the jth jet, 
as ordered in decreasing jet Et rank, and Ajets is 
the number of jets in the event. 

2 . Vs 

This parameter is the invariant mass of the Ajets 
system. 






FIG. 6. The Ht, VS, and Eti/Ht distributions for data 
(predominantly background) and for herwig ti generated at 
a top quark mass of 175 GeV/c^. Each plot on the left is nor¬ 
malized according to the expected number of events. On the 
right the plots are normalized to unity and reveal significant 
discrimination between signal and background. 

3 . Eti /Ht 

Et^ is the transverse energy of the 7?.=0.5 cone jet 
with highest Et- This parameter characterizes the 
Et fraction carried by the leading jet, and tends 
to be high for QCD background. The ti events are 
likely to have transverse energy roughly equiparti- 
tioned among all six jets, and hence the leading Et 
jet is, on average, fractionally softer. 

Figure shows the distributions of Ht, \/S, and 
Eti/Ht, each of which reveals signihcant discrimination 
between signal and background. This and subsequent 
figures for the parameters are shown both normalized to 
cross section, and normalized to unity. 

B. Parameters sensitive to additional radiation 

As previously noted, the QCD background is primarily 
a 2 ^ 2 parton process that contains additional radiated 
gluons. These gluons tend to be much softer than the 
leading partons, and therefore the jets associated with 
this radiation tend to have smaller Et- Three parameters 
that measure the hardness of this radiation are: 

4. H^ 

This variable is defined as 00^ 

H^V =Ht- Et, - Et, (5.2) 

where Et, and Et, are the transverse energies of 
the two leading (highest Et) jets. By subtracting 
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the Et of the two leading jets, what remains is a 
better measure of any additional gluon radiation in 
QCD events, enhancing the discrimination between 
tt signal and QCD background. 

nA 

jets 

An average jet count parameter, provides a 

way to parameterize the number of jets in an event, 
while taking account of the hardness of these jets. 
We define: 


N^ts = 


Jib 


pbb 

hb 


dEf'^ 


( 5 . 3 ) 


where N{EA^) is the number of jets in a given event 
with |? 7 | < 2.0 and Et greater than some threshold, 
E^^ in GeV. Therefore, this parameter corresponds 
to the number of jets, but is more sensitive to jets 
of higher Et than just a simple jet count above 
some given threshold. 

6- ^Ts.e 

The transverse energies of the fifth jet, Et^, and 
sixth jet, ETfi , are also useful in discriminating 
QCD background from ti events. Our final selec¬ 
tion (see Sec. Vll(a)) requires at least six jets. For 
background these usually correspond to soft radia¬ 
tion. The variable chosen is: 

Ets- ( 5 . 4 ) 


Figure shows distributions of Hi^, ^-nd Ats.s- 
Again, these variables are effective in differentiating be¬ 
tween signal and background. 


C. Aplanarity and sphericity 






FIG. 7. The and Et^ g distributions for data 

(predominantly background) and for herwig tt events. Each 
distribution is normalized to the expected number of events 
(left) and to unity (right). 


The equation Qi + Q 2 + Qs = 1 represents a plane in 
a space spanned by Qi,Q 2 , and Q 3 , and the inequality 
restricts the range of each eigenvalue, as shown in Fig. ||: 

0 <Qi<i, ( 5 . 7 ) 

0<Q2<^, 

^ < Qs < 1- 

The magnitude of any Qi represents the portion of 
momentum flow in the direction of the eigenvector. 
Limiting event shapes can therefore be characterized as 
follows: 


The direction and shape of the momentum flow of jets 
in tt production are different from those in QCD back¬ 
ground. These differences can be quantified using event- 
shape parameters [^. For each event, we define the 
normalized momentum tensor Mab- 

•^jets -^jets 

Mab = E E p'j 

3 3 

where a and b run over the x, y, z components (indices 
of the tensor), and j runs over the number of jets in 
an event. As is clear from its definition, Mab is a sym¬ 
metric matrix that is always diagonalizable, and has 
positive-definite eigenvalues {Qi,Q 2 ,Q 3 ) satisfying the 
conditions: 

Qi + Q 2 + Q 3 = 1 and 0 < Qi < Q 2 < Qs- (5.6) 


• Linear : Qi = Q 2 = 0 and Q 3 = 1. 

• Planar : Qi = 0 and Q2 = Q3 = 5 - 

• Spherical '■ Qi = Q 2 = Qs = A 

The aplanarity (A) and sphericity (5) parameters that 
we use are defined as follows: 

7 . A= |Qi, 

8. iS = I {Qi + <52), 

with 0 < A < 0.5 and 0 < 5 < 1. 

Top quark (ti) events tend to have higher aplanarity 
and sphericity than background events. We calculate 
A and S in the pp collision frame; little difference is found 
using the parton center of mass frame. Distributions of 
A and S for herwig ti events for TOt=175 GeV/c^ and 
for data are shown in Fig. 
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D. Parameters sensitive to rapidity distributions 



FIG. 8. The allowed range of normalized momentum tensor 
eigenvalues in the space spanned by the Qi. 





FIG. 9. The aplanarity and sphericity distributions for 
data (predominantly background), and for herwig tt events. 
Each distribution is normalized to the expected number of 
events (left) and to unity (right). 


9. C 

The centrality (C) parameter is defined as: 


C = 


Hj' 


(5.8) 


where 

-^jets 

t=l 


(5.9) 


Centrality is similar to Ht, characterizing the 
transverse energy in events, but is normalized in 
such a way that it depends only weakly on the mass 
of the top quark. 


10. (ry2) 

To good approximation, the ry distribution for jets 
in tt events is normally distributed about zero with 
an RMS, cr,,, close to unity. With typically six or 
more jets in an event, the RMS of the jet rj dis¬ 
tribution can be a useful discriminator. The (?7^) 
variable is defined using only the leading six jets. 
We use TZ=0.5 cone jets for this variable. 

We calculate by taking the square of the differ¬ 
ence between each jet ry and the ifT-weighted mean, 
fy, weighted by a factor yV’(i?T)- W{Et) depends 
upon the difference in RMS between tt signal (cr“) 
and background (crj)*'®), and is larger at those Et 
values where signal and background are expected 
to differ. The (ry^) parameter is given by: 


) =- 


where 

W{Et) = 


EU^iEr,) 


<(ifT) - a'i^^HEr) 
<(^t) 


and 


1 




^ E rij. 


(5.10) 

(5.11) 

(5.12) 


Note that both a^{ET) and a^'^^{ET) depend on 
the Et of the jets in the ry distribution. Jets with 
lower Et tend to be at larger values of |ry|, and 
consequently cr,, decreases with increasing Et ■ The 
QCD multijet background has a broader distribu¬ 
tion in the (?y^) variable than the tt signal. 

The C and (jf’') distributions are shown in Fig. |l^, for 
mt = 175 GeV/c^. 

The above ten kinematic variables are employed as in¬ 
puts to the first neural network. The output of this net¬ 
work is an input to the second (and final) neural network, 
whose three other inputs are described in the following 
section. 
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FIG. 10. The centrality and {if') distributions for data 
(predominantly background) and for herwig tt events. Each 
distribution is normalized to the expected number of events 
(left) and to unity (right). 

VI. EVENT STRUCTURE VARIABLES 

In addition to the kinematic and topological charac¬ 
teristics examined in Sec. V, there are other differences 
between the tt signal and the QCD multijet background 
that we will exploit in extracting the tt signal. 

A. pt of tagging muon 

The Pt of the tagging muon gives further discrimina¬ 
tion between tt signal and QCD background. Not only 
does the fragmentation of b quarks produce higher pT ob¬ 
jects, but the b quark is also more energetic in tt events 
than in background. Thus, the mean muonpr, is sig¬ 
nificantly larger in ti events. Figure shows the muon 
Pt spectra. Figure |^(a) compares the muon pT in her¬ 
wig and ISAJET tt events, which shows that the muon 
Pt spectrum is modeled consistently by Monte Carlo. 
Figure |^(b) compares herwig tt events and data (pre¬ 
dominantly background). These results show that the pt 
of the muon can serve as a useful tool in differentiating 
between signal and background. 

B. Widths of jets 

At the simplest level, each t{Fj quark decays into a 
6 (6) quark and a W^{W~) boson, with each W bo¬ 
son decaying into light quarks. Barring extra gluon 
bremsstrahlung, this represents six quark-jets in the final 
state. The average jet multiplicity for herwig ti events 



Muon Pt (GeV/c) 



FIG. 11. Gomparison of muon pr spectra for (a) herwig 
and ISAJET ti events, and (b) herwig ti events and data. 
These distributions have been normalized to unity. 

(mt=175 GeV/c^) using our selection criteria is 6.9, im¬ 
plying that the contribution from gluons is relatively 
small. Conversely, jets in the QCD multijet background 
originate predominantly from gluon radiation. Although 
gluon splitting can take place, producing both quark and 
gluon jets, it is expected that gluons dominate QCD mul¬ 
tijet production. 

QCD predicts substantial differences between quark 
jets and gluon jets and, in fact, observed differences in 
quark and gluon jet widths have been reported by experi¬ 
ments at the KEK e+e“ collider (TRISTAN) [ pTf and the 
CERN e+e” collider (LEP) |^. Parton shower Monte 
Carlos such as herwig have been shown to reproduce the 
widths observed in data [^, although herwig has been 
found to slightly underestimate jet widths at the Fermi- 
lab Tevatron ||^] . We found that by applying a correction 
of 3% to the widths, herwig QCD Monte Carlo repro¬ 
duces the observed distributions in the width of the jets. 
Further studies have shown that the kinematic distribu¬ 
tions of the multijet background are also well modeled 
using herwig. We have therefore chosen herwig as the 
generator for studying jet widths, with a 3% correction 
applied to the widths of each jet. 

Figure |^(a) shows the mean width of 0.5 cone jets 
versus jet Ep for multijet data and herwig QCD and 
Fig. |^(b) compares the data to herwig ti. Here, the jet 
width is: 

c^jet = + (6-1) 

where <Jr^ and are the transverse energy weighted RMS 
widths in p and (j), respectively, and are calculated using 
the {rj,4') positions of each calorimeter bin (O.I x 0.1 in 
Aiy X A(^) weighted by the transverse energy in that bin. 
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FIG. 12. The mean width of 0.5 cone jets versus their Et 
for (a) data (bars) and herwig QCD (stars), and (b) data 
(bars) and herwig ti (stars). 

In order to account for the broadening of jets from ad¬ 
ditional minimum bias interactions which could overlap 
an event, corrections were applied to the widths of each 
jet in the event. These corrections were typically a few 
percent, and depended, among other factors, upon the 
instantaneous luminosity during that event. These cor¬ 
rections were determined by assuming that the energy 
coming from minimum bias interactions was uniformly 
distributed in Arj and A^, and therefore the measured 
RMS of a jet was the sum in quadrature of its true RMS 
and the RMS of a uniform distribution. 

It is clear from Fig. |^(a) that herwig QCD describes 
the widths observed in the data, and the herwig tt has 
significantly narrower jets. This suggests that the dif¬ 
ference may be due to the different mix of gluons and 
quarks in the two processes. 

For Monte Carlo it is possible to match initial state 
quarks to final state reconstructed jets because the her¬ 
wig ti events are relatively simple. The mapping be¬ 
tween quarks and jets requires a tight match in ATZ be¬ 
tween the initial quark and the jet, as well as a reasonable 
match in energy. The following criteria were employed to 
define Monte Carlo “quark-like jets”: 

• Good quality 0.5 cone jet, reconstructed without 
merging (not formed from two or more adjacent 
jets) and with \r]\ < 2.5, 

• Distance between initial quark and its recon¬ 
structed jet to be ATZ < 0.05, 

• The difference in energy between the quark and the 
jet AE < sj i^quark {E in GeV). 

Monte Garlo “gluon-like jets” were defined to be good 
quality jets, without merging, but where the separation 


FIG. 13. Distributions in jet RMS width, crjet, for herwig 
quark-like jets (solid) and the gluon-like jets (dashed) for (a) 
5 < Ft < 25 GeV, (b) 20 < Ft < 40 GeV, (c) 35 < Ft < 55 
GeV, (d) 50 < Ft < 70 GeV, (e) 65 < Ft < 85 GeV, and (f) 
80 < Ft < 100 GeV. These distributions were normalized to 
have equal numbers (1000) of events. 

distance to the nearest quark was ATZ > 1. Imposing 
these criteria, the distributions in the jet RMS widths are 
shown in Fig. |^. To guide the eye, Gaussian fits have 
been superimposed on the distributions. With these def¬ 
initions, it appears that gluon-like jets are 20-30% wider 
than quark-like jets. 

Figure |l^ suggests that the jet RMS distributions for 
these definitions of quark/gluon jets can be approximated 
by Gaussians. A Fisher discriminant can be used to 
differentiate statistically between any two such distribu¬ 
tions. We defined a Fisher discriminant, Fjet, in terms of 
the individual jet width crjet and the width expected for 
gluon-like ((Tgiuon) and quark-like (cquark) jets, as follows: 

yj- _ (^jet f7quark(Ii^T)) (^jet f^giaon )) 

0-quark (^-r) 

( 6 . 2 ) 

We used this single parameter to characterize the 
quark-like or gluon-like essence of a jet. This discrim¬ 
inant is summed over the four unmerged jets with the 
smallest values of in an event to form a variable T 
which reflects whether the event is more tt-like (signal) 
or more QCD-like (background). Summing only over the 
four smallest values of iFjet (most quark-like jets), accord¬ 
ing to Monte Garlo, optimizes the discrimination. This 
summed discriminant, IF, will be used in our search for 
ti signal in the all-jets channel. The distributions of F 
are shown in Fig. M. It is known that jet widths are not 
as well modeled in isajet j^, and we have, therefore, 
based this discriminant only on the herwig generator. 
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FIG. 14. Distributions of T for, (a) data (predominantly 
background) and herwig QCD, and (b) data and herwig tt 
events. 

Figu re [l^ (a) shows T for data and herwig QCD, and 
Fig. |14|(b) shows T for data and herwig tt —> all-jets. 
Comparison shows that the jets in data are significantly 
wider, and are more consistent with herwig QCD than 
with herwig ti,. 


C. Mass likelihood parameter 

A mass likelihood variable, A4, defined below, provides 
good discrimination between signal and background by 
requiring two jet pairs that are consistent with the W 
boson mass, and two W jet pairs that are consistent 
with a single top quark mass of any value. Since there 
are no high-p^ leptons in the all-jets channel, and hence 
no high-pT neutrinos, the event is in principle fully re- 
constructible. The presence of two W bosons in ti events 
provides significant rejection against QCD background. 
A further requirement that the two reconstructed top 
quarks have equal masses provides some additional dis¬ 
crimination. A4 is defined as a y^-like object: 

_ {Mwi - MwY , (.Mw 2 - , (fRii - 

J^l — ‘2 ' 2 ' 2 ’ 

(6.3) 

where Mw^ {Mw 2 ) is the mass of the two Tl=0.b cone jets 
corresponding to the W boson from the first (second) top 
quark, of mass mt^ (Tnt^)- The parameters Mw, crw and 
at were fixed at 80, 16 and 62 GeV/c^, respectively. The 
last two values approximate the full widths of the two 
distributions, and taking them to be constant simplifies 
the calculation. 

The M variable is calculated by looping over combi¬ 
nations of jets, and assigning all jets with |p| < 2.5 to 


FIG. 15. Distribution in mass likelihood parameter for (a) 
HERWIG and ISAJET ti events, (b) herwig QGD and data, 
and (c) herwig ti events and data. These distributions were 
normalized to unity. 

one of the W bosons or b quarks from the two top quark 
decays. The smallest value of M is selected as the dis¬ 
criminator. To reduce the number of combinations, two 
jets are assigned to each W boson, and one to the b quark 
from one of the two top quarks. Jets from the W boson 
are required to have Et > 10 GeV, while those from the 
b quark must have Et > 15 GeV. All remaining jets are 
assigned to the b quark from the second top quark. To 
keep 6-tagged events on the same footing as untagged 
events, no a priori assignment is made between tagged 
jets and b quarks. Since in the top quark rest frame the 
W boson and the b quark have equal momenta, the Et 
of W bosons and 5-jets are more similar than for QCD 
background. The following criterion helps further reduce 
combinatorics: 

• Et{Wi) + iiT(W2) ^ 0-65 iJr, 

where Et(Wi) i^T{W 2 )) i® the Et from the vector sum of 
two jet momenta assigned to the W boson from the first 
(second) top quark. Although there are other possible 
algorithms for assigning jets to the two top quarks, the 
discrimination in the M variable is not very sensitive to 
the choice of reasonable algorithms. 

The distributions in the M variable are shown in 
Fig. 1^. Figure |T|(a) compares the M variable in 
herwig and isajet ti events (mt=175 GeV/c^). Fig¬ 
ure |l^b) compares herwig QCD and the data (predom¬ 
inantly background). Figure |lq(c) compares herwig ti 
events and data. These plots show that this variable 
is modeled consistently by the two ti Monte Carlos, that 
HERWIG QCD models the background well, and that A4 is 
useful in discriminating between signal and background. 
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TABLE III. The 13 variables used in the neural network 
analysis, the jet cone size employed and the tt event charac¬ 
teristic upon which it discriminates are given. 


Variable 

Description 

Cone 

Characteristic 

Ht 

Total transverse energy 

0.3 

Energy 

Vs 

Total tt 

center-of-mass energy 

0.3 

Energy 

Energy 

Eti /Ht 

Leading jet transverse 
energy fraction 

0.5/0.3 

Energy 

H'V 

Transverse energy of 
non-leading jets 

0.3 

Radiation 


Weighted number of jets 

0.3 

Radiation 


Et of 5th and 

6th jets 

0.3 

Radiation 

A 

Aplanarity 

0.3 

Topology 

S 

Sphericity 

0.3 

Topology 

C 

Centrality 

0.3 

Topology 

~vr 

Rapidity distribution 

0.5 

Topology 

Pt 

Pt of tagging muon 

- 

Event 

Structure 

T 

Fisher discriminant 
based on jet widths 

0.5 

Event 

Structure 

M 

Mass likelihood 

0.5 

Event 

Structure 


D. Correlations between parameters 


• isolated leptons: Events containing an isolated 
electron or muon were rejected. This ensured 
that our event sample was orthogonal to those used 
in the tt analyses in other decay channels. 

• >25 GeV: Removed QCD 2^2 events with 
little additional jet activity. 

• number of jets: Events with fewer than six 
TZ=0.3 cone jets or more than eight 7?.=0.5 cone 
jets were rejected. 

— By eliminating events with fewer than six 
TZ—0.3 cone jets, the signal-to-background ra¬ 
tio is improved. Only 14% of the signal is lost, 
while 36% of the background is rejected. (The 
Et of the sixth jet is required in the calcula¬ 
tion of several variables.) 

— Removal of events with more than eight 
7^=0.5 cone jets also improves signal-to- 
background, rejecting 13% of the background 
and only 5% of the signal. The calculation of 
the M variable and Fisher discriminant are 
thereby improved because of the reduction in 
the number of jet combinations. 

Of the roughly 600,000 events passing our initial crite¬ 
ria (see Table II), approximately 280,000 events survive 
these selection requirements. 


A summary of the 13 parameters used in this analysis 
is given in Table III. The first ten parameters are simple 
kinematic variables, and are correlated. To quantify the 
degree of correlation between any two variables x and y, 
we define a linear correlation coefficient, r as [E3: 




[nj:x^ - [NEy^ - 


(6.4) 


The value of r ranges from 0, when there is no cor¬ 
relation, to ±1, when there is complete correlation or 
anticorrelation. Table IV shows the average correlations 
among 13 parameters defined in Sec. V and Sec. VI for 
data. These are average correlation coefficients; local 
correlations can vary significantly, depending upon the 
region of multivariate space. Note that the parameters 
pi^, T, and M. have relatively small correlations with the 
other kinematic parameters. 


VII. ANALYSIS 


A. Event selection criteria 

Before proceeding further with the analysis, basic qual¬ 
ity criteria were applied to the data and to Monte Carlo 
events: 


B. Muon tagging 

The direct branching fraction of a 6 quark into a muon 
plus anything is 10.7 ± 0.5% |^. However, when all 
contributions from decays of b and c quarks from the two 
top quarks are considered, and with a muon acceptance 
of about 50%, approximately 20% of the events in the ti 
—> all-jets mode are expected to yield at least one muon. 
Muons in QCD background processes arise mainly from 
gluon splitting into cc or bb pairs, but intrinsic cc and 
bb production as well as in-flight pion and kaon decays 
within jets also contribute. These sources occur in only 
a small fraction of the events, and therefore only a few 
percent of the QCD multijet background events will have 
a muon tag . 

To take advantage of the difference in the muon tag 
rate and enhance the tt signal, our analysis requires the 
presence of at least one muon near a jet in every event 
(“6-tagging”). This also provides a means of estimat¬ 
ing the background in a given data sample, which can 
be determined purely from data. The 6-tagging require¬ 
ment should give nearly a factor of ten improvement in 
signal/background ||]. 

Procedures for tagging jets with muons were defined 
after extensive Monte Carlo studies of tt production in 
lepton-l-jets final states Q. The requirements used to 
select such muon tags are: 
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TABLE IV. Average correlations among the 13 parameters for data. 



Ht 

Vs 

Eti/Ht 

H/^ 

Ntsts 

'® 7 ’ 5,6 


5 

C 

{v^) 

Pt 

T 

M 

Ht 

1 

0.80 

-0.14 

0.71 

0.76 

0.39 

0.01 

0. 

0.17 

-0.31 

0.04 

-0.04 

0.05 

VI 


1 

-0.20 

0.64 

0.64 

0.36 

-0.16 

-0.25 

-0.32 

0.14 

0.01 

-0.08 

0.05 

Et-,^ / Ht 



1 

-0.54 

-0.36 

-0.37 

-0.34 

-0.23 

0.07 

0.14 

-0.02 

0.23 

0.30 





1 

0.76 

0.71 

0.25 

0.15 

0.05 

-0.25 

0.04 

-0.02 

-0.10 

Nits 





1 

0.44 

0.12 

0.09 

0.09 

-0.27 

0.04 

-0.05 

-0.04 







1 

0.21 

0.12 

0.02 

0.02 

0.03 

-0.03 

-0.10 

A 







1 

0.58 

0.26 

-0.30 

0.04 

-0.07 

-0.16 

S 








1 

0.37 

-0.40 

0.03 

-0.04 

-0.14 

C 









1 

-0.59 

0.05 

0.06 

0. 

W) 










1 

-0.05 

-0.07 

0.03 

Pt 











1 

-0.01 

0. 

T 












1 

0.10 

M 
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• The presence of a fully reconstructed muon track in 
the central region (|? 7 | <1.0). This restriction does 
not have much impact on the acceptance of muons 
from b quark jets from tt decay because these b 
quarks tend to be produced mainly at central ra¬ 
pidities. 

• The track must be flagged as a high-quality muon. 
This quality is based on a fit to the track in both 
the bend and non-bend views of the muon system 



• The signal from the calorimeter in the road defined 
by the track must be consistent with the passage 
of a minimum ionizing particle. The signal is mea¬ 
sured by energy deposited in the calorimeter cells 
along the track. 

• Because the pt spectrum of muons from pion and 
kaon decays is softer than from heavy quarks, an 
overall pt > 4.0 GeV/c cutoff is imposed to en¬ 
hance the signal from heavy quarks. Imposing 
this cutoff has limited impact on the ti acceptance, 
since the muon energy must be greater than about 
3.5 GeV in order to penetrate the material of the 
calorimeter and the iron toroid at ry=0. 

• The muon must be reconstructed near a jet that 
has \r]\ <1.0 and Et > 10 GeV. The distance AT?.^ 
in r]-(j) space between the muon and the jet axis 
must be less than 0.5. 

If a muon satisfies the above conditions, the jet asso¬ 
ciated with the muon is defined as a 6-tagged jet, and 
the muon is called a tag. Of the roughly 280,000 events 
which survived the initial selection criteria, 3853 have at 
least one 6-tagged jet. 


C. Muon tagging rates 


The probability of tagging QCD background events 
containing several jets is observed to be just the sum 
of the probabilities of tagging individual jets |^, and is 
approximately independent of the nature of the rest of 
the event. The muon tagging rate is therefore defined in 
terms of probability per jet rather than per event. We 
define the muon tagging rate as the ratio of tagged to 
untagged jets, allowing us to multiply this function by 
the number of untagged events to obtain an estimate of 
the tagged background. 

Initially, the t^ging rate was modeled only as a func¬ 
tion of jet Et @J§1. However, it was found subsequently 
that there was an ? 7 -dependence to the muon tag rate 
which depended on the date of the run. This was traced 
to the fact that the muon chambers experienced radiation 
damage, and required that some of the wires be cleaned 
during the run. Figure 16 shows the relative muon de¬ 
tection efficiency as a function of the rj of the jet for dif¬ 
ferent ranges of runs. Figures p^(a)- |I^(c ) correspond to 
the time before the cleaning and Fig. |16|(d) to that after 
the cleaning (Vrun > 89000). These plots illustrate the 
need to account for the dependence on r] and run number 
when performing estimates of tagging rates. 

To address this problem, the tag rate for background, 
Ptag(^'T,^,Afrun), was parameterized as a function of jet 
Et, jet 77 , and the run number, and was assumed 

to factorize: 


Pt.^{ET,p,K,,^) = /{Et) ■ e(7?, (7.1) 

where /{Et) is the relative probability that a jet of given 
Et has a muon tag, and 6 ( 77 , Vru„) is the relative muon 
detection efficiency. The functions /{Et) and 5 ( 77 , N^un) 
are not normalized individually, but it is the product of 
the two which is normalized. 

Besides the differences in chamber efficiency caused by 
the deterioration and cleaning of wires, there were also 
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FIG. 16. The relative muon detection efficiency as a 
function of the rj of the jet, for different ranges of 
runs: (a) Nmn <70000, (b) 70000< Nmn <84000, (c) 
84000< Nrun <89000, and (d) Afrun >89000. The curves rep¬ 
resent the results of polynomial fits. 


FIG. 17. The relative probability, /(St), for cen¬ 
tral jets as a function of the jet Et, for runs in the 
range: (a) Nmn <70000, (b) 70000< Nmn <84000, (c) 
84000< Afrun <89000, and (d) Nmn >89000 . Th e curves rep¬ 
resent the results of a common fit using Eq. 7.2, and saturate 
at high jet Et- 


changes in the gas mixtures used in the muon chambers 
between the la period and Ib (see Table I), and changes 
in the high voltage settings, which were implemented at 
Run 84000. These required two additional separations 
of runs, as shown in Fig. |^. We also found a small 
dependence of the tag rate function on VI of the entire 
event, which is described below. 

The jet Et factor in the muon tag rate function 
{/{Et)) is shown in Fig. 0 /{Et) was parameterized 
in two ways, which allowed us to estimate a systematic 
error due to the model dependence of this function. The 
first parameterization assumed that /{Et) saturates at 
high values of jet Et, and was given by the form: 

/{Et)=AoV^^^^), (7.2) 

where H(a:) is the normal frequency function {i.e., \^{x) = 
e~^ I'^dz), which approaches one at high jet Et- 
The parameters Aq , Etq , and A are obtained from the fits 
to the observed tag rates, shown in Fig. 0 

An alternative parameterization of /{Et) assumed a 
polynomial in hi{ET), and was given by: 

/{Et) = qq + ai ln{ET) + 02 IV{Et) + 03 ln^(£'T)- 

(7.3) 


Here, /{Et) continues to increase with jet Et, and 
the constants oq, oi, 02 , and 03 are again obtained 
from fits to the observed tagged distributions, shown in 
Fig. |I|. The difference in the background estimate be¬ 
tween Eq. 7.2 and Eq. 7.2 is discussed in Sec. VILE Be¬ 
cause the tagging rate in Eq. 7^2 continues to grow with 
increasing jet Et, it gives a s lightly larger estimate of 
the background than Eq. \l-2\ Increasing the tag rate 
increases the estimated background, thereby decreasing 
the signal. Both versions of /{Et) give similar fits. 


but as our Monte Carlo studies showed that the tag rate 
continues to slowly increase with jet Et, even for high 
Et, we chose equation for estimating the background 
in this analysis. 

Having considered all factors that go into the tag rate 
function on a jet-by-jet basis, we looked for dependence 
on characteristics of the event as a whole. We observed 
a small additional dependence, most notable in vari¬ 
ables that are sensitive to the total energy scale of the 
event. Figure ^ shows the muon tag rate in two bins of 
VI, which reflects the total energy of the partonic col- 
lisio n. The superimposed solid curves represent fits to 
Eq. 7^, but where the coefficients oq, oi, 02 , and 03 are 
now second-order polynomials in VI- In Fig. p^b), the 
dashed curve represents the fit at 200 < VI <300 GeV/c^, 
and a small shift in the relative tag rate is apparent. This 
VI dependence was included through a modification of 
the principal ifT-depen den t part of the function, /{Et)- 

As indicated by Eq. 7A, the observed tag rate is the 
product of two parts. Because of this, the fits of Eqs. |7^ 
or 7.3 to the observed tag rate are correlated with the 


muon detection efficiency. To disentangle the two com¬ 
ponents, the fit used data only from central rapidities, 
where the detection efficiency was a weak function of ry. 
The criterion e(?7, A^run)/e(0, A^run) >0.6 defined the re¬ 
gion used in the fit, corresponding to the region where 
the Ty-dependence varied least rapidly. Once this initial 
/{Et) was determined, it was necessary to use it to re- 
estimate e{ri, A^run)- This involved taking the ratio of the 
number of observed tagged jets to the number predicted 
using the initial /(Et)- This ratio, as a function of ry, 
is plotted in Fig. 16 for different run ranges. The pro¬ 
cess of fitting /{Et) and then re-calculating e{ri, N^un) 
was iterated several times until stable results were ob¬ 
tained. The final relative probabilities {/{Et)) are shown 
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FIG. 18. The relative probability, /{Et), for cen¬ 
tral jets as a function of the jet Et, for runs in the 
range: (a) A^run <70000, (b) 70000< A^run <84000, (c) 
84000< A^run <89000, and (d) A^run >89000. The curves rep¬ 
resent the results of a common fit using Eq. 7.2, and do not 
saturate at high jet Et- 




FIG. 19. The relative probability, /{Et), for central jets as 
a function of the jet Et, for (a) 200< \/i <300 GeV/c^ and 
(b) 400< \/I <500 GeV/c^. Solid curves represent fits using 
Eq. o, including a dependence on \/I. The dashed curve 
represents the fit at 200< '/S <300 GeV/c^. 


in Fig. and Fig. |^, and the final relative efficiency is 
shown in Fig. These are labeled relative probabil¬ 
ities/efficiencies because it is not possible to determine 
the overall normalizations of /{Et) and e{r],Nrun) inde¬ 
pendently; it is t heir product which is well determined. 

Using Eq. |7.l| , the number of expected tagged events 
(from background) in a given event sample is 


«.T = E 


^ ^ Pta.g{ET, Tj, Afrun)- 
events jets 


(7.4) 


In using Eq. to estimate the tagged background, we 
assumed that this relation remains valid for extrapolation 
from the background region through to the signal region. 
These regions will be defined in terms of the neural net¬ 
work output, in Sec. VILE. This supposes that there is no 
significant correlation between the intrinsic heavy quark 
(cc or bb) content and the neural network output, apart 
from any kinematic correlation through variation in Et 
and r], as parametrized by Eq. 7.4. Therefore, we at¬ 


tribute any exces s of tagged events over the background 
predicted by Eq. 7.4 to tt production. 


D. Background modeling 

Since the kinematic variables are calculated using the 
jet energies, they are to some extent sensitive to the small 
shift in energy due to the presence of the tagged muon 
and its associated neutrino. As was described earlier, 
jets are measured through the deposition of energy in 
the calorimeter, and are not corrected for the muon’s 
momentum. The neutrino’s energy is, of course, missed 
completely, and there is no unique prescription for cor¬ 
recting the jet’s energy for the neutrino. However, these 
corrections are typically small (of the order of the muon 
momentum). 

Previous analyses aimed at determining the top 
quark mass have incorporated approximate correction 
factors for the energies of tagged jets. For our analy¬ 
sis, such corrections are not strictly needed, and as we 
argue below, are disfavored due to the correlations they 
introduce between the Et of the tagged jet and the pt 
of the tagging muon. Our procedure consists of calcu¬ 
lating the muon tag rate function (Eq. 7.1) from jets 
containing muon tags and untagged jets as follows: we 
denote the distribution of untagged jets as a function of 
Et by U{Et), and the distribution of the tagged jets by 
T{E'rp. The distribution U{Et) reflects dominantly QCD 
background. Here, Et is the transverse energy observed 
for jets with no observable muon, and thus is on aver¬ 
age the true jet energy; E'j, is the observed energy for 
tagged jets, without corrections, and thus is missing the 
contributions to the progenitor jets due to the transverse 
energy of the muon and neutrino. We formed the ratio 
T{E'rp)/U {Et), taking the same numerical values of E'j. 
and Et- This ratiowas then parameterized, as discussed 
in Sec. VH.C, to give the tag rate function, PtasiEr)- 
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The Et distribution of QCD background events with a 
tagged jet, B{Et), for our analysis was then obtained 
using the untagged jet sample U{Et) from the expres¬ 
sion B{Et) = PtagiEr ) X U{Et), which, apart from the 
smoothing applied to the tag rate function, is equivalent 
to B{Et) = T{E!j,). 

Although there is no a priori advantage to using uncor¬ 
rected E'rp instead of corrected Et for the tagged jets, it 
does simplify the background calculation for the neural 
network analyses. Our studies show that the pt of the 
muon is uncorrelated with E'j,, but not with Et- This 
is illustrated in Fig. |2^(a), which shows the mean muon 
Pt as a function of the tagged jet E'j- for data. A fit to 
a straight line gives a slope consistent with zero. Fig¬ 
ure §(b) shows muon pT distributions for three distinct 
ranges of tagged jet E'j, (chosen to be equally populated); 
they are indistinguishable. Similar plots are shown in 
Fig. ^ for HERWIG tt events. Again, no significant corre¬ 
lation between muon pT and tagged jet E^ is observed. 

Since the pt of the muon is not correlated with the 
uncorrected jet Et, it is largely independent of event 
kinematics and the probability of finding a muon of a 
given Pt factorizes from the tag rate function. Tagged 
background events can therefore be generated by adding 
(“fake”) muons to untagged events by assigning a ran¬ 
dom Pt value from the observed pt spectra. The value 
of Pt enters into the second neural network and must be 
generated for the modeled background. The pt distri¬ 
butions for both data (predominantly background) and 
HERWIG ti events were fitted separately to the sum of 
two exponentials, and the parameterizations from the fits 
were used in the random generation of muon pT values 
for both background and signal. These spectra and the 
associated fits are shown in Fig. 22, As discussed above, 
correcting the jets for muon and neutrino pt would intro¬ 
duce correlations that would complicate the application 
of the tag rate function; we have consequently not applied 
such corrections to the jet energies. 

The procedure used for estimating the number of 
tagged events expected from background can be checked 
by comparing the distributions of estimated tags to those 
for the observed tags. Figure ^ shows this comparison 
for the distributions in each of the 13 parameters used in 
this analysis, for the entire multijet tagged data sample. 
In these distributions the tt fraction is negligible, as less 
than 40 ti events are expected. The predicted rate, abso¬ 
lutely normalized using Eq. 7.4, is shown for all distribu¬ 
tions, and consistently reproduces the observed number 
of tagged events. The values of per degree of freedom 
for the plots in Fig. ^ are given in Table V. 

Once the background sample is generated, these events 
are treated exactly as the tagged sample (the sample used 
to extract signal). The neural network is applied to both 
sets of events, tagged and modeled background (untagged 
eventsF “fake-tags”), and the difference between the two 
represents an excess that is attributed to the ti signal. 
Similarly, “fake-tags” are applied to the untagged her- 
WIG ti events, and these events are used to model the sig- 



FIG. 20. (a) Mean muon pt (dots) versus tagged jet E't 
and (b) muon pt distributions for three jet Ay ranges (chosen 
to be equally populated) for data events. The line in (a) is 
the average of the points. No correlation is observed between 
the muon px and the jet Ay, where Ay is the observed energy 
for tagged jets, without corrections (see text). 



10 20 30 

Tagging Muon (GeV/c) 

FIG. 21. (a) Mean muon py (dots) versus tagged jet Ay 
and (b) muon py distributions for three jet Ay ranges (cho¬ 
sen to be equally populated) for herwig tt events. The line in 
(a) is the average of the points. No correlation is observed be¬ 
tween the muon py and the jet Ay, where Ay is the observed 
energy for tagged jets, without corrections (see text). 
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FIG. 22. Muon pt distributions for (a) data (predomi¬ 
nantly background) and (b) herwig tt events. The smooth 
curves are from hts to the sum of two exponentials. The fact 
that the curve in (a) is below the points for pr > 35 GeV/c 
does not measurably bias this analysis, because the fraction 
of events in that region is < 0.6%. 


TABLE V. Psr degrees of freedom for the plots in 
Fig. 1^. For simplicity, only bins with more than ten events 
were used and only statistical errors were included in the cal¬ 
culations. 


Variable 

/ d.o.f. 

Probability of x^ 

Ht 

20.1 / 20 

0.45 


25.4 / 25 

0.44 

Eti /Ht 

24.1 / 20 

0.24 


17.5 / 22 

0.74 

Nits 

16.9 / 18 

0.53 

-El’s,6 

26.7 / 25 

0.37 

A 

15.0 / 23 

0.89 

5 

13.7 / 18 

0.75 

C 

10.0 / 18 

0.93 


22.0 / 17 

0.18 

Pt 

18.2 / 26 

0.87 

T 

33.7 / 25 

0.11 

M 

23.6 / 24 

0.48 


nal. This effectively increases the statistics of the tagged 
events in the Monte Carlo ti sample. 

A correction for the small contamination of the back¬ 
ground sample due to tt events is made (see Sec. VII.I). 


E. Neural network analysis 


Artificial neural networks constitute a powerful exten¬ 
sion of conventional methods of multidimensional data 
analysis |Q, and are well suited to our search because 
they handle information from a large number of inputs 
and can account for nonlinear correlations between in¬ 
puts. A neural net is a multivariate discriminant. Its 
construction typically consists of input nodes, output(s), 
and intermediary “hidden nodes”. The connection be¬ 
tween any two nodes is governed by a sigmoidal function 
which is characterized by a “weight” and “threshold”. 
The neural network is “trained” by setting weights and 
thresholds of the nodes through an optimization algo¬ 
rithm. 

The output of the neural network is simply a map¬ 
ping between the multidimensional space described by 
our kinematic input variables and a one-dimensional out¬ 
put space. Setting a threshold on the output of the neural 
network corresponds to a set of hypersurface cuts in mul¬ 
tidimensional input space. Consequently, the neural net¬ 
work output may be employed to discriminate between 
signal and background as long as the following conditions 
are observed: 


• The neural network is trained on event samples that 
are independent of the sample used for the measure¬ 
ment. 

• There is a reliable method for determining the 
background level for a given value of neural net¬ 
work output. 


Independence of the training sample and the sample 
used to extract the ti signal is maintained by considering 
only 6-tagged events in the final extraction of a signal 
for ti production. Events that did not have a 6-tagged 
jet are used for training and for defining the background 
sample. 

In order to simulate the background, untagged events 
were made to resemble tagged events by adding muon 
tags to one of the jets in the event. With such “fake” 
muons, these events were taken to represent the back¬ 
ground. The prescription for adding these muons to the 
untagged jets was described in Sec. VIED. A subset of 
these events was used to train the neural network re¬ 
sponse to background. 

The set of 13 parameters (see Table III) was used as 
the set of input nodes in training the neural network. 
Because training time increases markedly and quality of 
convergence decreases with the number of input nodes 
and hidden layers, the problem was simplified by first 


20 












8 10 
)er of 6-tagge 
s), as a funct 
GeV), (g) A, 























Neural Network Output 



FIG. 24. Initial training of the neural network (NNo). The 
network output is shown for (a) data, and (b) herwig tt 
Monte Carlo for mt=180 GeV/c^. 



Neural Net Output 



Neural Net Output 


FIG. 25. The distributions in final neural network (NN 2 ) 
output for (a) data (diamonds) and expected background (his¬ 
togram) and (b) herwig tt signal for mt=180 GeV/c^. 


as inputs to the second neural network (NN 2 ). NN 2 was 
trained using tagged herwig tt Monte Carlo events and 
“fake” tagged data, also described in Sec. VII.D. 


training a neural network using the first ten kinematic 
variables. These variables tended to be more highly cor¬ 
related than the remaining three (see Sec. VI). Based 
on studies using our training samples, we chose to have 
20 hidden nodes and one network output, and used the 
back-propagation learning algorithm in jetnet |2|] . The 
output of this neural network and the remaining three pa¬ 
rameters were used as inputs to a second neural network. 
Here, we chose eight hidden nodes and one network out¬ 
put. 

Events used to train the two neural networks were se¬ 
lected as follows. A simpler initial network (NNq), us¬ 
ing a subset of seven kinematic parameters (excluding 
EtJHt, and {rf')), was trained using all events. 

The output of this network, for both data and herwig ti 
Monte Carlo, is shown in Fig. Figure 24 shows that 
the tt signal tends to peak at values of neural network 
output near 1 (the “signal region”), whereas the back¬ 
ground events peak near 0 (the “background region”). 
For the final training samples, we selected data and tt 
Monte Carlo events having NNq > 0.3. This neural net¬ 
work was used only for choosing the best training sam¬ 
ples, and was not employed in the final analysis (i.e., 
all events were reanalyzed). Removing events that were 
very unlikely ti candidates (NNq < 0.3) improved the ef¬ 
ficiency of the training and increased network sensitivity 
to background events that more closely mimic ti event 
characteristics, thereby improving signal-to-background 
discrimination in the final analysis. 

Training of the two neural networks used in the final 
analysis proceeded as follows. The first neural network 
(NNi) was trained on the ten kinematic variables us¬ 
ing the training sets, as described above. The output of 
NNi, and the remaining three variables were then used 


F. Cross section using neural network fits 

The ti cross section, integrated over all values of neural 
network output, is determined from the distributions in 
the output of the final neural network. Any excess of the 
tagged data over the modeled background distribution 
is attributed to ti production. This excess, integrated 
over all values of neural network output, is independent 
of the neural network, and depends only on the accuracy 
of the modeling of the background by the tag rate func¬ 
tion. If the location of any excess appears in the region 
of ti signal (in neural network output) it would make 
these events likely ti candidates. The final neural net¬ 
work (NN 2 ) distributions for the data and the expected 
background are shown in Fig. ^(a), and for herwig ti 
events in Fig. Hb). The normalization of the tt signal 
is described below. These distributions demonstrate a 
strong discrimination between signal and background. 

We extract the cross section from a fit to the data of the 
sum of the neural network output distributions expected 
for the ti signal and for QCD multijet background. Be¬ 
cause the shapes of the ti and QCD network output dis¬ 
tributions differ significantly, the relative amounts of each 
can be disentangled. The generated herwig ti events 
were arbitrarily normalized assuming att = 6.4 pb at each 
top quark mass. This value needs to be factored out in 
normalizing Fig. [2^b). The data of Fig. |^(a) are fitted 
using minimization to the hypothesis: 

A^expected = Abkg + g ^ C^'b) 

where A^bkg ^be expected number of background events 
in the bin, and is the expected signal in this bin. 
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Neural Net Output 

FIG. 26. The distribution in neural network (NN 2 ) out¬ 
put for data (diamonds) and the fits for expected signal 
and background. The signal was modeled with herwig for 
mt=180 GeV/c^. The errors shown are statistical. 


Because the full Monte Carlo sample, scaled to the total 
number of events (given by 6.4 pb multiplied by the inte¬ 
grated luminosity), is subjected to exactly the same trig¬ 
ger and selection criteria as the data, accounts for the 
luminosity, branching ratio (BR), and tt efficiency of our 
selection criteria. Both Abkg, the background normaliza¬ 
tion factor, and cr^t, are obtained from the fit, along with 
their respective statistical errors. The results of this fit 
are shown in Fig. 

By allowing the signal and background normalization 
factors to be determined from the ht, this method simul¬ 
taneously provides the tt cross section and a more sen¬ 
sitive measurement of the background normalization. It 
efficiently exploits all information about the tt cross sec¬ 
tion and background normalization from the entire range 
of neural network output, without choosing any partic¬ 
ular cutoff on neural network output. The distributions 
for signal, background and data are shown separately in 
Fig. The error bars are the square root of the number 
of data events in each bin. 

Events at the lowest values of neural network output 
(< 0.02) have been removed, leaving 2207 events, or 
slightly more than half of the tagged data sample. The 
resulting fits may be checked by varying the region of 
NN 2 used. (Fig.uses events with NN 2 > 0.02). Fig¬ 
ure 27 shows results for Apkg and au as a function of the 
lower limit in NN 2 employed in the fit. The results are 
seen to be quite stable to the change of this lower limit. 
We note that the jets in events with NN 2 < 0.02 tend to 
have low Ep, where the tagging rate may not be as well 
determined due to the low tagging probability. Because 
the background modeling may be less accurate in the very 
low NN 2 region, where the background so strongly dom¬ 
inates the data distribution, we impose a cut of NN 2 > 
0.02 for our fits to Abkg and a-tt- The stability of the 
results shown in Fig. 27 supports this choice. 

A similar plot was produced and fitted for several top 
quark masses, and the values of the cross section obtained 
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FIG. 27. Results of combined fits (as in Fig. |2^ ) when data 
points are removed at small values of neural network output. 
The refitted (a) background normalization and (b) ti cross 
section are plotted as a function of the number of points elim¬ 
inated. Error bars are statistical, but are correlated through 
the error matrix. 

TABLE VI. Results of the fits to neural network output. 
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41 


Top Quark 
Mass (GeV/c^) 

-^bkg 

CTtt 

(Pb) 

X' / d.o.f. 

140 

1.05 ± 0.03 

18.4 ± 7.8 

17.6 / 17 

160 

1.06 ± 0.03 

9.3 ± 3.8 

17.2 / 17 

170 

1.07 ± 0.02 

7.2 ± 3.0 

17.1 / 17 

180 

1.07 ± 0.03 

6.3 ± 2.5 

16.9 / 17 

200 

1.07 ± 0.03 

5.1 ± 2.0 

16.8 / 17 

220 

1.07 ± 0.03 

4.4 ± 1.7 

16.7 / 17 


using the output distribution for herwig tt events gener¬ 
ated at that mass. The results are shown in Table VI for 
several top quark masses. Interpolating to the value for 
the top quark mass as measured by D0 j|] (toj = 172.1 
± 7.1 GeV), we obtain = 7.1 ± 2.8(stat) pb. 

Fitting the data in Fig. ^ only to the background 
forced to zero), changes the normalization to 1.09 ± 
0.03, and the total per degree-of-freedom to 23.1/18. 
We note that the change in comes predominantly 
from the last three bins of neural network output (in 
Fig. and the probability for a change in of 6.2 (for 
mt=180 GeV/c^) for one additional degree-of-freedom is 
consistent with the significance of the extracted cross sec¬ 
tion, which is 2.5 standard deviations from zero. 


G. Cross section using counting method 

The traditional method for extracting the tt cross sec¬ 
tion served as a useful check on the above procedure. We 
assumed an absolute normalization of the background as 
given by the tag rate function. Taking the excess in ob- 
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TABLE VII. Number of observed events, expected back¬ 
ground, observed excess, and expected signal (assuming 
mt=180 GeV/c^ and att=6.4 pb), for the threshold on the 
neural network output shown in Fig. M. 


Observed 

Expected 

Observed 

Expected 

Number 

Background 

Excess 

HERWIG tt 

of Events 

Events 

of Events 

Events 

41 

24.8± 2.4 

16.2 

15.9 ± 2.6 


served events (seen in Fig. to be from tt production, 
we calculate the cross section for the process using the 
conventional relation: 


^jhs ^bkg 

e X BR X C 


(7.6) 


where Aobs is the number of observed events with neural 
network output greater than some threshold, Abkg is the 
corresponding number of expected background events, 
e X BR is the branching ratio (BR) times the efficiency 
(e) of the criteria used for selecting ti events, and £ is the 
total integrated luminosity (110.3 ± 5.8 pb“^). We use 
HERWIG as the model for calculating the value of e x BR. 

The number of events, as a function of the threshold 
placed on the output of the neural network, is shown in 
Fig. p^(a). The error bars are the square root of the 
number of events in each bin. The upper smooth curve 
in Fig. I^a) represents the sum of the expected signal 
and background, and the lower curve is just the expected 
background. The statistical error in the cross section 
depends upon where the threshold is placed. A plot of 
the relative statistical error versus the threshold on the 
output of the neural network is shown in Fig. ^^b). The 
fractional error £ is approximated by: 


£ = 


ViNti + N^n) 

Ntt 


(7.7) 


where Ntt and Wkg are the expected number of tt and 
background events above the neural network threshold. 
We wished to place the final threshold at or near the 
minimum error, and chose 0.85, as shown in Fig. p^(b). 
The number of events above this threshold, the expected 
background, and the expected signal are shown in Table 


VII. 

Using Eq. 7£, Table VIII lists the efficiency times 
branching ratios for two input top quark mass values, 
and the extracted tt cross sections. We note that the 
method in Sec. VII.F gave ti cross sections of 7.2 and 6.3 
pb for rrit of 170 and 180 GeV/c^, respectively, in good 
agreement with the values in Table VIII. When interpo¬ 
lated to the measured top quark mass of 172.1 GeV/c^, 
this determination yields a cross section of 7.3 ± 3.0 ± 
1.6 pb. The results from the ht to the neural network are 
slightly lower, as one would expect, since the background 
normalization was 1.07 (instead of being hxed to 1 here). 
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FIG. 28. (a) The number of events (data) above any thresh¬ 
old on the neural network, and (b) the expected fractional 
error on the ti cross section as a function of the threshold 
placed on the neural network output. The vertical line at 
0.85 indicates the chosen threshold. The smooth curves in 
(a) represent the sum of the expected number of signal and 
background events (assuming mt=180 GeV/c^ and fJtt=6.4 
pb) and the expected number of background events only. 


TABLE VIII. Cross sections for tt production, using the 
counting method, obtained from the 5-tagged events for mt 
= 170 and 180 GeV/c^. 


mt 

Signal Efficiency 

Gross Section 

GeV/c^ 

X BR 

(pb) 

170 

0.019 ± 0.0032 

7.5 ± 3.1 ± 1.6 

180 

0.022 ± 0.0037 

6.5 ± 2.6 ± 1.4 


The changes in efficiencies as a function of top quark mass 
reflect the sensitivity of the selection criteria to the input 
mass rrit- The statistical and systematic uncertainties in 
the cross sections are discussed in Sec. VII.I. 


H. Double-tagged events 


The requirement of a second 6-tagged jet in the event 
further reduces the background, thereby increasing the 
signal-to-background ratio. Unfortunately, the addi¬ 
tional requirement significantly reduces the expected 
yield. However, the search for these “double-tagged” 
events serves as a consistency check of the single-tag anal¬ 
ysis, and also as a test of the model for the background. 
The number of events that contain two 6-tagged jets is 
shown in Table IX for various NN 2 thresholds. The two 
6-tags are required to originate from separate jets; two 
tags within the same jet are counted as a single tag. 
The higher muon px is used as the input to the neu¬ 
ral net wor k. The background is again calculated based 
on Eq. 7.1, where Ftag(AT, p, summed over all jets, 

represents the expected number of tags in the event. The 


24 

























double-tag probability is obtained via the Poisson distri¬ 
bution, and is the likelihood of observing at least two 
tagged jets, given the expected number. This follows 
since the tag rate function is a rate per jet, and, within 
our model, the two tagged jets are uncorrelated. 

We make the assumption that the fraction of double- 
tagged events from correlated sources,_ such as direct 
heavy-quark pair production (cc or 56), remains un¬ 
changed over the entire range of the neural network out¬ 
put variable. This assumption is motivated by the fact 
that the energy scales in such events are well above the 
energy thresholds for heavy-quark pair production, and 
therefore the fraction of these events should be indepen¬ 
dent of the neural network output. The good agreement 
between the background model and data in the single- 
tagged channel supports this assumption. 

We determine the normalization of the background 
by fitting the neural network output distribution to 
the expected background and signal contributions as in 
Sec. VII.F. The 32 events were binned in neural network 
output, and the log-likelihood calculated. The minimum 
in negative log-likelihood occurs for a background nor¬ 
malization factor of 0.97^018, where the errors corre¬ 
spond to a change in log-likelihood of 1/2. In determining 
this normalization, the expected ti signal was not varied, 
but the result is insensitive to this value. Allowing the 
data to determine the normalization through this fit ac¬ 
comodates the possibility that the tag rate function for 
the second muon in the event is different from that for 
the first muon. The two errors on the expected back¬ 
ground in Table IX represent the uncertainties due to 
the tag rate function, ti subtraction and Et scale (see 
Sec. VII.I) and the normalization error, respectively. 

We note that the fitted normalization is consistent with 
that for the single tagged sample indicating that the sec¬ 
ond muon tag probability is roughly the same as for the 
first. The total number of events for NN 2 > 0.02 is in 
good agreement with the sum of expected background 
plus the small contribution from top. The small excess 
persists as the NN 2 threshold is increased, in agreement 
with expectations. The double tag analysis supports our 
conclusion that the singly-tagged sample is due to ti pro¬ 
duction. 


I. Corrections and uncertainties 


In this subsection we discuss the major sources of sys¬ 
tematic uncertainty that affect either the background 
estimate or signal efficiency. The statistical errors on 
the cross section and b ackg round normalization come di¬ 
rectly from the fit (Eq. 7.5) shown in Fig. M 


• The statistical error in the calculation of the back¬ 
ground is estimated by the number of untagged 
events falling in the signal region. This estimate 
of 24.8 events, and an approximate mean tagging 
rate of 2%, implies of the order of 1240 untagged 


TABLE IX. Number of observed double-tagged events, ex¬ 
pected background, observed excess, and expected signal (as¬ 
suming mt = 180 GeV/c^ and (Ttf=6.4 pb), versus the thresh¬ 
old on the neural network output. The first error in the ex¬ 
pected background is due to the errors in the tag rate func¬ 
tion, ti correction, and the Et scale uncertainties. The second 
error is due to the uncertainty in the fitted background nor¬ 
malization factor, and is assumed to be fully correlated at 
different NN 2 values. 


NN 2 

Threshold 

Observed 
Number 
of Events 

Expected 

Background 

Events 

Observed 
Excess 
of Events 

Expected 
HERWIG ti 
Events 

0.02 

32 

28.7 ± 5.5 ± 5.7 

3.3 

2.7 

0.1 

22 

16.6 ± 3.2 ± 3.3 

5.4 

2.7 

0.2 

17 

11.8 ± 2.3 ± 2.3 

5.2 

2.7 

0.4 

12 

6.8 ± 1.3 ± 1.4 

5.2 

2.5 

0.6 

7 

3.5 ± 0.7 ± 0.7 

3.5 

2.1 

0.8 

3 

1.1 ± 0.2 ± 0.2 

1.9 

1.4 

0.85 

2 

0.7 ± 0.1 ± 0.1 

1.3 

1.2 

0.9 

1 

0.4 ± 0.1 ± 0.1 

0.6 

1.0 


events for the background, and a consequent 3% 
statistical uncertainty in the background estimate. 
This contributes a 4% uncertainty in the cross sec¬ 


tion based on the counting method in Eq. 7.6 


• The error in the normalization of the tagging rate 
was taken from the combined fi ts t o the output of 
the neural networks using Eq. 7.5. This error is 
shown in Fig. ^(a), and was taken to be 5%. It is 
used only in the calculation of the error on the back¬ 
ground, as it is already included in the cross sec¬ 
tion. (The statistical error on the cross section was 
obtained from a simultaneous fit to the normaliza¬ 
tion of both background and signal, and accounts 
for the error on the background normalization.) 

• The uncertainty in the parameterization of the tag¬ 
ging rate results in a 5% uncertainty in the pre¬ 
dicted number of background events. This was es¬ 
timated by comparing the pred icte d numbe r of tags 
for two functional forms (Eq. 7^ and Eq. |7.3| ) as¬ 
sumed for the tag rate. Unlike the normalization 
of the tagging rate, this error accounts for possible 
changes in the shape of the background as a func¬ 
tion of neural network output. This results in a 7% 
uncertainty in the ti cross section. 


• The presence of ti events in the data used for es¬ 
timating background has been taken into account 
in all results presented thus far. The procedure 
used to estimate the correction to the background 
proceeds as follows. Calling the number of 

untagged ti events wrongly assigned to the back¬ 
ground estimate, we can estimate as: 
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( 7 . 8 ) 


^*f"^®=^(^obs-iVbkg)/tag 


where the ^ corrects the 6-tagged signal back to 
the untagged signal (recall that tt events are tagged 
roughly 20% of the time), /tag is the average tag 
rate per event, and fVobs and fVbkg refer to events in 
the final tagged data sample. The corrected back¬ 
ground estimation therefore becomes: 

iVbkg(corr) = fVbkg - (7.9) 


This correction is applied bin by bin in Fig. |^, 
and is approximately 4% in the signal region. We 
therefore assign a systematic uncertainty of 4% to 
the background estimate and a corresponding 6% 
to the tt cross section. 


• Because untagged events, when multiplied by the 
tag rate function, model the tagged background, 
the Et scale of both sets must be the same. Any 
mismatch between these can produce subtle differ¬ 
ences in the scales of the kinematic variables. A 
useful measure of this scale is mean Ht- We ob¬ 
serve that the difference in mean Ht between our 
data and background model is 1.5 ± 1.4 GeV (see 
Fig. |2|(a)), which is consistent with no mismatch. 
We take 1.4 GeV to be the uncertainty in the en¬ 
ergy scale of the background model. This 1.4 GeV 
is added to one of the jets (we arbitrarily choose the 
jet with highest Et), event-by-event, in the back¬ 
ground calculation and the analysis is redone. The 
resultant change in the background is 4.2%, and 
9.1% change in the cross section. 

• The statistical error in the tt efficiency is 3.2%. 

• Any difference in the turn-on of the trigger ef¬ 
ficiency for data and for tt Monte Carlo events 
can affect the signal efficiency. The difference can 
originate, for example, from the modeling of elec¬ 
tronic noise or from the simulation of the underly¬ 
ing event. Furthermore, this efficiency can depend 
upon the mass of the top quark. From our trig¬ 
ger simulations, we estimate < 5% uncertainty in 
signal efficiency from such sources US- 

• The uncertainty in the integrated luminosity was 
taken to be 5.3% This arises mainly from the 
uncertainty in the absolute luminosity, and affects 
all runs systematically. 

• Any difference in the relative energy scale between 
data and Monte Carlo affects the efficiency for sig¬ 
nal. This uncertainty was determined using the 
MPF method ||^, as described in Sec. IV.C. Vary¬ 
ing the energy scale in the tt Monte Carlo by ± 
(4% -I- 1 GeV) [|| changes the efficiency for signal 
by ± 5.7%. 





U.^ u u.u u.o 

Aplanarity Centrality 


FIG. 29. Fractional differences in efficiencies between 
ISAJET and HERWIG (cisAjET - eHERwiG)/eHERwiG for TTit ~ 180 
GeV/c^ (a) as a function of threshold on Ht, (b) as a func¬ 
tion of threshold on , (c) as a fnnction of threshold on 
Aplanarity, and (d) as a fnnction of threshold on C. 


• The tt tag rate is based on the ti Monte Carlo, 
but assumes that the performance of all detector 
components was stable during the run. The Monte 
Carlo acceptance was reduced by 7.0% to correct 
mainly for muon detection inefficiencies that were 
not modeled in our simulation. We estimate a 
7.0% uncertainty in the tt efficiency from any such 
changes in the muon tag rate. 

• Uncertainty in the model for ti production is esti¬ 
mated by comparing ti predictions from isajet and 
HERWIG generators. Figure ^ shows the fractional 
differences in efficiencies ((cisajet - eHERwio)/eHERwiG) 
for different thresholds on Ht, H^, Aplanarity 
and C (again, for rrit — 180 GeV/c^). Although 
the two generators differ significantly in the tails 
of these distributions, on average they are in rea¬ 
sonable agreement. The systematic error was esti¬ 
mated by repeating the analysis using events gen¬ 
erated with ISAJET. In order to remove the effects 
of the Fisher discriminant (IF), which is not well 
modeled in isajet, F values were randomly chosen 
based on a parameterization of the herwig ti F 
distribution. To further remove the dependence on 
the tag rate, randomly generated values of muon 
Pt were taken. The expected distributions for the 
two generators, normalized as before, are shown in 
Fig. Identical thresholds were placed on the 
neural network output. The cross section changed 
by 6.2%, which we take as the uncertainty in the 
overall signal efhciency due to ti model dependence. 

• The 6% uncertainty in the b ^ branching frac- 
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FIG. 30. Expected distributions in final neural network 
output (NN 2 ) for HERWIG tt signal and isajet tt signal for 
mt=180 GeV/c^. 

tion corresponds to an average over the pro¬ 
duced i3-mesons. This 6% enters directly into the 
acceptance error in the Monte Carlo. 

• The pt of the tagged muon enters as an input to 
the neural network. The mean pT in herwig tt 
events was 14.7 GeV/c, while in isajet it was 15.9 
GeV/c, an 8% difference. Rescaling the muon pT in 
HERWIG by 8% changes the cross section by 7.0%, 
which is taken as a systematic error. 

• The uncertainty resulting from the modeling of the 
Fisher discriminant for the jet widths, fF, was es¬ 
timated by comparing data to our herwig QCD 
Monte Carlo. The mean value of T in data was 
0.0470 ± 0.0002 and in herwig QCD it was 0.0488 
± 0.0019. The difference of 0.0018 ± 0.0019 in¬ 
dicates that our modeling is reasonable. The un¬ 
certainty on this result, 0.0019, was systematically 
added to the value of fF, event-by-event, in the 
HERWIG tt generator, and the cross section recal¬ 
culated. The observed change in the cross section 
of 2.0% is used as the systematic error from this 
variable. 

The sizes of the above effects are summarized in Table 

X for the uncertainties in the background, and in Table 

XI for the cross section. Adding both statistical and sys¬ 
tematic errors in quadrature, we estimate the background 
as 24.8 ± 2.4 events (see Table VII). Similarly, the uncer¬ 
tainty in the efficiency of the tt signal is calculated from 
the errors in Table XL 


TABLE X. Summary of statistical and systematic uncer¬ 
tainties for the background estimate. 


Background Source 

Size of Uncertainty 

Statistical Error 

3 % 

Normalization of the Muon Tag Rate 

5 % 

Functional Form of the Muon Tag Rate 

5 % 

Background Correction for ti Signal 

4 % 

Background Et Scale 

4 % 


TABLE XI. Summary of statistical and systematic uncer¬ 
tainties for the cross section. 


Background Source 

Size of Uncertainty 

Statistical error 

4 % 

Functional Form of the Muon Tag Rate 

7 % 

Background Correction for ti Signal 

6 % 

Background Et scale 

9 % 

Signal Source 

Size of Uncertainty 

Statistical Error 

3 % 

Trigger Turn-on 

5 % 

Luminosity Error 

5 % 

Jet Energy Scale 

6 % 

tt Tag Rate 

7 % 

Model Dependence 

6 % 

& —> Branching Fraction 

6 % 

Py Dependence 

7% 

T Dependence 

2 % 
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J. Measured cross section 


By fitting the shape of the output in the neural network 
distribution, we obtain the tt production cross section 
as a function of the input mass of the top quark. The 
tt cross sections extracted for several values of the top 
quark mass, along with a function used to interpolate 
the ti cross section (drawn as a smooth curve), are shown 
in Fig. Interpolating both the cross section and the 
statistical error, we find = 7.1 ± 2.8 ± 1.5 pb for 
mt=172.1 GeV/c^ [|. 

The all-jets cross section can be combined with previ¬ 
ous D0 measurements of the ti production cross section, 
as extracted from channels where one or both of the W 
bosons decay leptonically |^. This cross section, aver¬ 
aged over all leptonic channels, was 5.6 ± 1.4 (stat) ± 
1.2 (syst) pb at mt = 172.1 GeV/c^, and is shown super¬ 
imposed on Fig. ^ The statistical errors on the all¬ 
jets and leptonic cross section measurements are uncor¬ 
related. The systematic uncertainties in the following 
categories were assumed to be correlated with a correla¬ 
tion coefficient of 1.0. 

• Luminosity. 

• Jet energy scale. 

• Muon tagging efficiency. 

• Non-leptonic trigger efficiency. 

• Top quark generator. 

• 6 —> /r branching ratio and muon pr spectrum. 

• Background tag rate function. 

The combined result for the D0 ti production cross 
section is 5.9 ± 1.2 (stat) ± 1.1 (syst) pb for 
mt=172.1 GeV/c^. 
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FIG. 31. The ti cross section extracted through fitting the 
shapes of the distributions in neural network output to data, 
shown as a function of top quark mass. Error bars are sta¬ 
tistical only. For reference, the D0 ti cross section and top 
quark mass from leptonic channels 0 is shown in the figure 
(open square). 



FIG. 32. The expected (line) and observed (diamonds) val¬ 
ues of significance of ti signal, plotted in terms of Gaussian 
equivalent standard deviations. The vertical line corresponds 
to the cutoff that is expected to yield the greatest significance. 


K. Significance of signal 

In this section, we estimate the significance of the ex¬ 
cess of ti signal relative to expected background. We 
define the probability (P) of seeing at least the number 
of observed events (iVobs), when only background is ex¬ 
pected. The significance of a ti signal can be character¬ 
ized by the likelihood of P being due to a fluctuation. If 
the distribution for the expected number of background 
events, fi, is assumed to be a Gaussian with mean 6, and 
has a systematic uncertainty ab, then P can be calculated 
as: 
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The optimal choice of selection criteria can be found by 
minimizing the expected value of P and, thereby, maxi¬ 
mizing the significance of the excess, assuming that A^obs 
is composed of ti signal and background. Both the ex¬ 
pected value and measured value of the significance are 
shown, along with the cutoff for greatest significance, in 
Fig. The result of the calculation, optimized for sig¬ 
nificance, with 18 observed events and an expected back¬ 
ground of 6.9 ± 0.9, is P = 0.0006, corresponding to a 
3.2 standard deviation effect. This is sufficient to estab¬ 
lish the existence of a ti signal in multijet final states. 

We consequently observe an excess in the multijet final 
states which we attribute to ti production. The cross sec¬ 
tion measured is consistent with previous measurements 
in other modes of ti decay [0. 
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VIII. SUMMARY 

We have performed a measurement of the tt production 
cross section in multijet final states. As described above, 
we observe an excess of events that can be attributed 
to tt production. The level of significance of the signal, 
as calculated from a possible upward fluctuation of the 
background to produce the observed excess, is sufhciently 
high to establish independently the existence of tt signal 
in the all-jets channel. 

Using the D0 measured value of 172.1 GeV/c^ for the 
mass of the top quark, we obtain a cross section of 7.1 ± 
2.8 (stat) ± 1.5 (syst) pb, which agrees with the D0 cross 
section as measured in the leptonic channels. Combin¬ 
ing this result with previous D0 measurements of the 
ti production cross section gives 5.9 ± 1.2 (stat) ± 1.1 
(syst) pb. 
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